- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.5k
test_runner: add bail out #56490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
test_runner: add bail out #56490
Conversation
| Review requested: 
 | 
89f64db    to
    55874e8      
    Compare
  
    | 
 Can you explain more about what was flaky? I'm guessing you mean the tests were at different points of execution when they received the bail out signal. I think the best way to work around this is to use test fixtures that never finish. | 
| 
 Hey @cjihrig, you're guessing right! Your proposed solution sounds good. I think we should add a test as follows: 
 WDYT? | 
| 
 There are all sorts of annoying edge cases to account for here, and it might be worth doing a survey of how the tap, mocha, and vitest runners handle bailing out when things are running in parallel. It's much more straightforward when only one thing is running. But, for example, if the very first test in the first process fails, do we bother spawning the other child processes at all? Or, in a bail out situation, how important is it to have an accurate summary at the end of the test run with correct counts for total tests, cancelled tests, etc. | 
55874e8    to
    f1c269c      
    Compare
  
    | @cjihrig I'm still checking different runners and it seems that mostly the behaviour is "non-standard". Mocha stops the execution returning a partial result without reporting the full list of cancelled tests.       const result = await runMochaAsync('options/parallel/test-*', [
        '--parallel',
        '--bail'
      ]);
      // we don't know _exactly_ how many tests will be skipped here
      // due to the --bail, but the number of tests completed should be
      // less than the total, which is 5.
      return expect(
        result.passing + result.pending + result.failing,
        'to be less than',
        5
      );Checking tests like this one I have the impression that the testing of the feature itself follows a "best effort" approach in more than one tool. While I'm still checking other examples I think that the most common use case for the bailout is to stop as soon as possible the execution in a CI/automation env. Regarding the tests I was thinking about: 
 Do you have any suggestions / other behaviours you think we should ensure? | 
| Codecov ReportAttention: Patch coverage is  
 
 Additional details and impacted files@@            Coverage Diff             @@
##             main   #56490      +/-   ##
==========================================
+ Coverage   89.19%   89.21%   +0.01%     
==========================================
  Files         662      662              
  Lines      191893   191984      +91     
  Branches    36937    36964      +27     
==========================================
+ Hits       171164   171280     +116     
+ Misses      13573    13540      -33     
- Partials     7156     7164       +8     
 🚀 New features to boost your workflow:
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. I have some concerns:
- The patch is pretty invasive for a feature that is not part of the default behavior. I'm not sure that it needs to be so invasive.
- There seems to be a good bit of work being done when it is not necessary. Have you been able to do any performance comparisons with these changes.
- The tests (thank you for adding so many tests!) seem to reliant on timers, which usually works fine locally, but is a recipe for flakiness in the CI.
| #handleEvent({ type, data }) { | ||
| switch (type) { | ||
| case 'test:fail': | ||
| if (data.details?.error?.failureType === kTestBailedOut) break; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of needing to check this for every failure, even if bail out mode is not enabled. Couldn't we break out in the test:bail event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but in that case, we would lose the "report' section.
Couldn't we break out in the test:bail event?
Specifically, how would you do that? Would you finalise the reporter, or propagate the abort up to the test runner root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmarchini we need to also account for the fact these changes impact custom reporters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atlowChemi, I think the most expressive approach would be to avoid interrupting the test stream (or taking any action that has a similar effect), as users might still want to list the tests that were not executed because of the bail.
@cjihrig regarding avoiding unnecessary controls: I was thinking about using a set of functions "decorating" the test:fail with the additional check only after a test:bail has been received. This would avoid the control in most cases. wdty?
| for await (const { type, data } of source) { | ||
| switch (type) { | ||
| case 'test:fail': { | ||
| if (data.details?.error?.failureType === lazyLoadTest().kTestBailedOut) break; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
Also, why is bailing out only supported in the spec and tap reporters? What about the dot reporter for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't implemented all the reporters yet, as I'm still not convinced by the bail implementation itself.
I'll fix this before landing if we're able to reach an agreement on the implementation 🚀
| this.passed = false; | ||
| this.error = err; | ||
|  | ||
| if (this.bail && !this.root.bailed) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of adding another way to cancel tests when we already have logic for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling root.postRun was the only way I found to immediately propagate the "stop" to the entire test tree.
I'm currently trying to achieve the same using an AbortController, but I haven't found a viable solution yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjihrig: All my doubts regarding the implementation revolve around how to interrupt the whole test tree.
I think other suggestions can be addressed, but I also have the impression that this specific part is the main blocker.
IMHO, bail is a missing and relatively important feature, and I’d like to find the best solution to land this.
Do you or other members of @nodejs/test_runner have any suggestions about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, this is the main blocker for this PR.
I haven't moved forward because I was unable to find an alternative solution.
@cjihrig, is this a blocker from your perspective, or is it something we could address in a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see some type of performance comparisons. I worry about the performance implications of such an invasive change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds reasonable, even though I have the impression that the impact on this point should be minimal.
My biggest concern is about the maintainability of the implementation, specifically reusing root.postRun in an unintended way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we could try is leveraging the existing logic used for filtering. If someone writes only tests, all other tests are completely filtered out. If we extended that logic, I think it might work well at the file level. I haven't experimented with this at all, so I don't know if it would work, but it might be nice to reuse some existing code.
        
          
                test/fixtures/test-runner/bailout/parallel-concurrency/first.mjs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                test/fixtures/test-runner/bailout/parallel-concurrency/first.mjs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                test/fixtures/test-runner/bailout/parallel-concurrency/second.mjs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                test/fixtures/test-runner/bailout/parallel-loading/infinite-loop.mjs
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | Hey @cjihrig, thanks as always for your review. Step by step: 
 Do you have any suggestions on how to reduce the footprint while better integrating it into the current implementation? I'm gaining more confidence in the codebase as I work on it, but I’m sure I might still be missing something important. 
 Not yet, but I’ll make sure to do that as soon as possible! 
 I agree. I’ll try to use your suggested approach to coordinate the files without relying on timers or other flaky solutions. | 
db86232    to
    791fb5e      
    Compare
  
    8833b0c    to
    a32f885      
    Compare
  
    | added: REPLACEME | ||
| --> | ||
|  | ||
| > Stability: 1 - Experimental | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > Stability: 1 - Experimental | |
| > Stability: 1.0 - Early development | 
| - REPLACEME | ||
| --> | ||
|  | ||
| > Stability: 1 - Experimental | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > Stability: 1 - Experimental | |
| > Stability: 1.0 - Early development | 
| By enabling this flag, the test runner will cancel all remaining tests | ||
| when it encounters the first failing test. | ||
|  | ||
| Note: The bail option is not currently supported in watch mode. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Note: The bail option is not currently supported in watch mode. | |
| > The bail option is not currently supported in watch mode. | 
Catching up with the last attempt (#48919), this is another try at introducing the bailout feature.
I'm opening this PR as a draft to discuss the implementation and because refactoring may be needed if this approach is well-received by the community.
Note: In some tests, I had to enforce a
concurrency=1setting because testing the bailout feature across multiple files concurrently proved to be extremely flaky.